-
Notifications
You must be signed in to change notification settings - Fork 936
[KYUUBI #7028] Persist the kubernetes application terminate state into metastore for app info store fallback #7029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0b0700b
to
f761223
Compare
3feec05
to
d41eea6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7029 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 695 696 +1
Lines 42833 42977 +144
Branches 5833 5839 +6
=======================================
- Misses 42833 42977 +144 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d41eea6
to
b93e31d
Compare
b93e31d
to
c59b079
Compare
Testing passed, cc @pan3793 |
5293f09
to
fc437aa
Compare
This PR has been well tested. cc @pan3793 |
fc437aa
to
0cad265
Compare
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/resources/sql/mysql/006-KYUUBI-7028.mysql.sql
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/resources/sql/mysql/006-KYUUBI-7028.mysql.sql
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/resources/sql/mysql/006-KYUUBI-7028.mysql.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB schema LGTM, the upsert implementation might have improvement room
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
Outdated
Show resolved
Hide resolved
4177681
to
d167623
Compare
|VALUES (${colsToInsert.map(_ => "?").mkString(",")}) | ||
|ON CONFLICT ($keyCol) | ||
|DO UPDATE SET | ||
|${colsToReplace.map(c => s"$c = EXCLUDED.$c").mkString(",")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|INSERT INTO $table (${colsToInsert.mkString(",")}) | ||
|VALUES (${colsToInsert.map(_ => "?").mkString(",")}) | ||
|ON DUPLICATE KEY UPDATE | ||
|${colsToReplace.map(c => s"$c = VALUES($c)").mkString(",")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99f08a4
to
57b42dd
Compare
57b42dd
to
12c24b1
Compare
ccea4c7
to
327a0d5
Compare
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JdbcDatabaseDialect.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/resources/sql/postgresql/001-KYUUBI-7028.postgresql.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for the dialect API style
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JdbcDatabaseDialect.scala
Show resolved
Hide resolved
…o metastore for app info store fallback ### Why are the changes needed? 1. Persist the kubernetes application terminate info into metastore to prevent the event lose. 2. If it can not get the application info from informer application info store, fallback to get the application info from metastore instead of return NOT_FOUND directly. 3. It is critical because if we return false application state, it might cause data quality issue. ### How was this patch tested? UT and IT. <img width="1917" alt="image" src="https://github.com/user-attachments/assets/306f417c-5037-4869-904d-dcf657ff8f60" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7029 from turboFei/kubernetes_state. Closes #7028 9f2bade [Wang, Fei] generic dialect 186cc69 [Wang, Fei] nit 82ea626 [Wang, Fei] Add pod name 4c59beb [Wang, Fei] Refine 327a0d5 [Wang, Fei] Remove create_time from k8s engine info 12c24b1 [Wang, Fei] do not use MYSQL deprecated VALUES(col) becf9d1 [Wang, Fei] insert or replace d167623 [Wang, Fei] migration Authored-by: Wang, Fei <[email protected]> Signed-off-by: Wang, Fei <[email protected]> (cherry picked from commit 02a6b13) Signed-off-by: Wang, Fei <[email protected]>
thanks, merged to 1.11.0 |
…to prevent data quality issue ### Why are the changes needed? Currently, NOT_FOUND application stated is treated as a terminated but not failed state. It might cause some data quality issue if downstream application depends on the batch state for data processing. So, I think we should treat NOT_FOUND as a failed state instead. Currently, we support 3 types of application manager. 1. [JpsApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala) 2. [YarnApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala) 3. [KubernetesApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala) YarnApplicationOperation and KubernetesApplicationOperation are widely used in production use case. And in multiple kyuubi instance mode, the NOT_FOUND case should rarely happen. 1. https://github.com/apache/kyuubi/blob/7e199d6fdbdf52222bb3eadd056b9e5a2295f36e/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala#L369-L385 3. #7029 So, I think we should treat NOT_FOUND as a failed state in production use case. It is better to fail some corner cases than to mistakenly set unsuccessful batches to the finished state. ### How was this patch tested? GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7033 from turboFei/revist_not_found. Closes #7033 ada4f88 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala 985e23c [Wang, Fei] Refine f03d612 [Wang, Fei] comments b9d6ac2 [Wang, Fei] incase the metadata updated by peer instance 3bd61ca [Wang, Fei] add 339df47 [Wang, Fei] treat NOT_FOUND as failed Lead-authored-by: Wang, Fei <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Signed-off-by: Cheng Pan <[email protected]>
Why are the changes needed?
How was this patch tested?
UT and IT.
Was this patch authored or co-authored using generative AI tooling?
No.